1 Packages

Packages are collection of several functions in R.

To use a perticular function from a package, it need to be downloaded and loaded.

1.1 Install a package from CRAN

# download and install 'ggplot2' package from CRAN
install.packages("ggplot2")

1.2 Introduction to Bioconductor

Bioconductor is a collection of more than 1,500 packages for the statistical analysis and comprehension of high-throughput genomic data. Originally developed for microarrays, Bioconductor packages are now used in a wide range of analyses, including bulk and single-cell RNA-seq, ChIP seq, copy number analysis, microarray methylation and classic expression analysis, flow cytometry, and many other domains.

This session introduces the essential of Bioconductor package discovery, installation, and use.

1.3 Use Biocondutor

Discovering, installing, and learning how to use Bioconductor packages.

The web site at https://bioconductor.org contains descriptions of all Bioconductor packages, as well as essential reference material for all levels of user.

1.4 Install from Bioconductor

install.packages("BiocManager")
BiocManager::install(c("rtracklayer", "GenomicRanges"))

2 Usecase

Lets see how to do diffrential expression analysis using Biocondutor package called DESeq2

Get the required files using wget in terminal.

# RAW Read Count Matrix
wget https://raw.githubusercontent.com/sk-sahu/learn-R/master/data/airway_count_matrix.txt
# Meta Data
wget https://raw.githubusercontent.com/sk-sahu/learn-R/master/data/airway_metadata.txt

This above is derived from Bioconductor experimental data airway package. Which is a part this publication.

General commands to run DESeq2

# Install DESeq2
BiocManager::install("DESeq2")
# Set Working Directory
setwd("your/working/dir/path")
# Import data
count_data <- read.table("reads_counts_matrix.txt", header = TRUE, row.names = 1)
metadata <- read.table("metadata.txt", header = TRUE, row.names = 1)
# Check data
all(rownames(metadata) == colnames(count_data))
# Built DESeq Object
dds <- DESeqDataSetFromMatrix(countData = count_data,
                              colData = metadata,
                              design = ~ conditions)
# Filter low reads counts
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
# Run Diffrential Analysis
dds <- DESeq(dds, parallel = FALSE)
# Results
res <- results(dds)
# Visulaize Results
plotMA(res)
abline(h=c(-1,1), col="dodgerblue", lwd=2)
# filter only significant genes (padj <= )
res_sig <- subset(res, padj <= 0.05)
# export the significant results to a file
write.csv(res_sig, "DE_results.csv")

# Normalization and Visulaization
## log-transcformation
rld <- rlog(dds)
## Box-plot
boxplot(log2(count_matrix), ylab="Log2(Read Count)", main = "Before Normalization",las=2)
boxplot(assay(rld), ylab="rlog transformation", main = "Before Normalization",las=2)
## PCA-plot
plotPCA(rld, intgroup=c("cell", "dex"))
## Expression Heatmap for 50 significant genes
sig_gene_list <- rownames(res_sig)[1:500]
heatmap(assay(rld)[sig_gene_list, ])
 

Created and Maintained by Sangram Keshari Sahu
Licensed under CC-BY 4.0
Source Code At GitHub
Template used from Rmdplates package